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Abstract 

Motivated as a null model for comparison with data, we study the 
following model for a phylogenetic tree on n extant species. The origin 
of the clade is a random time in the past, whose (improper) distribution 
is uniform on (0, oo). After that origin, the process of extinctions and 
speciations is a continuous-time critical branching process of constant 
rate, conditioned on having the prescribed number n of species at the 
present time. We study various mathematical properties of this model 
as n — > oo limits: time of origin and of most recent common ancestor; 
pattern of divergence times within lineage trees; time series of numbers 
of species; number of extinct species in total, or ancestral to extant 
species; and "local" structure of the tree itself. We emphasize several 
mathematical techniques: associating walks with trees, a point process 
representation of lineage trees, and Brownian limits. 
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1 Introduction 



1.1 A big picture 

This paper forms part of a larger project, which we first outline. There is a 
substantial literature on comparing data on different aspects of biodiversity 
or macroevolution - the evolutionary history of speciations and extinctions 
- with the predictions of simple "pure chance" stochastic models. Available 
data includes 

• fossil time series - fluctuations in number of taxa over time; 

• shapes of phylogenetic trees on extant species (Mooers and Heard ^1] 
provide an extensive survey); 

• the distribution of number of species per genus. 

The fit of simple models, and of more elaborate models incorporating con- 
jectured biological process, have been studied in these contexts. While 
data-motivated models are scientifically natural, a mathematical aesthetic 
suggests a somewhat different approach: start with a "pure chance" model 
which encompasses simultaneously all the kinds of data that one might hope 
to find. Here are two instances of what one would like such a model to pro- 
vide. 

• Joint description of the phylogenetic tree on an extant clade of species, 
its extension to the tree on an observed small proportion of extinct species, 
and the (unobserved) entire tree on all extinct species. 

• Joint description of fossil time series at different levels of the taxonomic 
hierarchy. 

We emphasize the latter because paleontology literature tends to assume 
that a model can be applied at any level, without enquiring whether this 
assumption is logically self-consistent. 

Our purpose in the larger project is to present what is arguably the math- 
ematically fundamental such model. The underlying model at the species 
level is simple - a critical branching process conditioned to have n lineages at 
the present time. This is the subject of the present paper, which is the part 
of the project most closely related to classical and contemporary applied 
probability. In a subsequent paper aimed at a more biological audience we 
will describe how the model extends to higher-order taxa by assuming each 
new species has some probability of founding a new higher-order taxon; we 
will consider several explicit classification schemes emphasizing desiderata 
such as monophyletic groups. 

Conceptually, this is a neutral model which does not incorporate con- 
jectured biological process such as intrinsic tendency for species numbers to 
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increase, differential speciation or extinction rates, or ecological constraints 
on numbers of species. For well-understood mathematical reasons (see sec- 
tion 121) neutral models like ours are implausible for large clades. In a sense, 
the model seems most appropriate as a "null hypothesis" for small clades, 
at the recent fringe of the Tree of Life, or for a geological period free of mass 
extinctions and their aftermath. 

Biological questions motivating our model, and suggested by the results 
of analysis of our model, will be treated in detail elsewhere, so we give just 
a brief mention here. Phylogenetic trees on extant species are nowadays 
based on molecular data; technical aspects of tree reconstruction form a 
large and important subject ^Ql |22j. But that is not our focus. Let us 
assume that in the near future we will have a large database of essentially 
correct phylogenetic trees, and also assume these include the time points of 
divergence of lineages (rather than just the "shape" of the tree). How might 
one use such a database? 

(i) Inference about a particular clade. If we have no direct knowledge about 
extinct species, then we cannot observe past fluctuations of number of species 
with time, and cannot observe the time of origin of the clade (typically 
longer ago than the observable time of most recent common ancestor of 
extant species). Inference about such quantities requires some stochastic 
model; given a model, one can use the observed phylogenetic tree to make 
inferences. 

(ii) Statistical properties of phylogenetic trees in general. In what systematic 
way do real phylogenetic trees differ from predictions of a simple model like 
ours which treats macroevolution as a purely random process, and what is 
the biological significance of such differences? 

1.2 Standard models 

Ours is, roughly speaking, the third simplest model one might devise, so let 
us first recall the two simpler models. 

The Yule model. Yule [21 proposed the basic model for speciations with- 
out extinctions. Initially there is one species. Thereafter, independently for 
each existing species, new species originate as "daughter" species at constant 
rate A (i.e. at the times of a Poisson (rate A) process). So for given n one 
can get a model for an n-species tree by taking the present as a random time 
at which the number of species equals n. (The associated continuous-time 
Markov chain counting number of species is often called the Yule process, 
though its origin as a model for species is often forgotten.) 
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The Moran/coalescent model. These models, developed and exten- 
sively used in population genetics, can also be applied to macroevolution 
(see e.g. ^^)- m the Moran model ([Sj sec. 3.3) the number of coexist- 
ing species is fixed at n. At successive discrete times, one randomly-chosen 
species goes extinct and another randomly chosen one speciates. Implicit in 
this model (run from the indefinite past until the present) is a model for the 
phylogenetic tree on the n extant species; for large n, with suitable rescal- 
ing of the time unit, the phylogenetic tree approximates the much-studied 
continuous-time coalescent model. To describe the coalescent model, we run 
time backwards from the present, starting with n "lines of descent"; in a 
time interval dt, each pair of lines of descent has chance dt to merge ("co- 
alesce") into one line, and we continue until reaching a single most recent 
common ancestor. See ^H] for a recent survey. 

Why a third model? Obviously many basic inference questions men- 
tioned earlier - about fluctuations in past numbers of (extinct) species, for 
instance - are not satisfactorily handled within the models above. Biologists 
have studied more elaborate models, mostly in one of two categories. (We 
will give a more detailed account elsewhere, but the bottom line is that the 
actual fit of real-world data to parametric models has not been studied as 
definitively as one might have expected.) Exponential growth models are 
exemplified by the linear birth-and-death chain model for species numbers 
(Aj = Xi, ^ = fii). This leads to a model with 3 parameters (A, fi, t*) 
where i* is time of origin of clade. Logistic stochastic models posit a logistic- 
shaped curve for species numbers, and also require 3 or 4 parameters to 
specify. In contrast, the model we study (described carefully in Section |2j) 
has only 1 parameter (mean species lifetime). It is this simplicity, and the 
desire to avoid the particular biological presumptions underlying exponential 
growth or logistic type models, that motivates our particular model. 

1.3 Outline of results 

A clade is the set of all species which are descendants of some (typically 
extinct) species. The succinct description (to be elaborated in Section |2J) of 
our model for a phylogenetic tree T n on a clade with n extant species is as 
follows. 

The origin of the clade is a random time in the past, whose 
(improper) distribution is uniform on (0, oo). After that origin, 
the process of extinctions and speciations is a continuous-time 
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critical branching process of constant rate, conditioned on having 
the prescribed number n of species at the present time. 

The conditioning is conceptually important: given a real phylogenetic tree 
on 23 extant species, we want to be able to compare it to the predictions 
of a stochastic model generating trees on exactly 23 extant species. The 
uniform prior (for time of origin) avoids the necessity to introduce a second 
parameter into the model. 

There is a vast mathematical literature on branching processes, but we 
haven't found detailed discussion of any very similar model. In the bio- 
logical literature, Wollenberg et al [2S1 give a simulation study of a similar 
model. On the other hand, this model is clearly open to analysis by the 
known techniques of applied probability. We exploit one particular modern 
approach to classical branching process theory: representing trees as walks. 
See Pitman |19j for a recent survey. This both leads to an exact "point pro- 
cess" description of lineage tree distributions (Proposition ^) and permits 
us to study asymptotics via weak convergence to Brownian motion. This 
methodology is known to specialists in other aspects of random trees but 
is perhaps less familiar in the subject of phylogenetic trees, so we try to 
explain the key ideas carefully even though they are not entirely new. 

Our results describe distributional properties of various aspects of the 
tree T n . 

• The lineage tree, via exact formulas (Proposition "global limits" 
(Corollaries and QJ), and "local limits" (Corollaries |0J and [7J). 

• The time series of number of species (Lemma|SJ), the maximum number 
of coexisting species (Corollary |9j), and the total number of extinct 
species (Corollary 110)1 . 

• The local limit structure of the complete (i.e. including extinct species) 
phylogenetic tree, relative to either a typical extant species (Proposi- 
tion unji or a typical extinct species (Proposition I18JI . 

• The joint distribution of time of origin of clade and time of most recent 
common ancestor (Corollary EJ), joint also with the number of species 
alive at the time of most recent common ancestor fCorollarv ll4|) . 

• The number of extinct species ancestral to some extant species (Corol- 
lary H7|. 

Finally we should admit that the whole paradigm of studying n — > oo 
asymptotics is rather unnatural, because the model is biologically unrealistic 
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for large n, but one can hope that the approximations implicit in asymptotic 
results are qualitatively correct for smaller values of n. Our web site 
shows Monte Carlo simulations for n = 8, 12, 20 with 10 repetitions. One 
can check that numerical values are broadly consistent with the asymptotic 
predictions. 



2 Model and notation 

In stating and deriving mathematical results we use the traditional language 
of branching processes (individuals, children, births, . . . ) even though we 
are envisaging species and so should be writing (species, daughter species, 
speciations, . . .). 

Let T be a continuous time critical branching process (CBP) starting 
with one individual. According to this process each individual lives for an 
Exponential (rate A) time, for some A > 0, during which it gives birth at 
times of an independent Poisson (rate A) process. After birth all individuals 
behave independently of each other. We can and will scale time so that 
A = 1; so the time unit is interpreted as mean species lifetime. 

Write Nf(i) > for the number of individuals alive at time t after the 
origin of T . A classical result ( 9^ §XVII.10.11) gives a modified Geometric 
distribution 

j.n—1 

V(Nr(t)=n) = (1 + t)n+1 , n>l (1) 
t 

, 71 = 0. 



l + t 



Write Tt t n for the process T originating at time t in the past and conditioned 
on having exactly n individuals at the present time. Within a process like 
7t,n or %i below, we use the notational convention that "time s" means time 
s before present. Thus within 7t 7 n, the time parameter s decreases from t 
to 0, meaning that time increases from time t before present to time 0, the 
present time. Our previous verbal definition of our model T n as a Bayes 
posterior (for T started at a uniform past time t S (0, oo) and conditioned 
on having n individuals at the present time 0) now becomes the following 
rigorous definition. Fix n > 1. 

v(r a , J O °P(rt, w £-)P(jVr(t) = n) dt 
in€ ' j f °°P(N r (t) = n)dt ■ 
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Using |J2) and the calculus fact 



g n— l 



turns this into 

P(T n G = y p (^- G •) dt, (2) 

Within the random tree T n , the time parameter s decreases from a random 
"time of origin" T° T to 0, where by the formula above, T° r has density 
function 

nt n ~ l 

^= {1 + t) n + i ' *>°- ( 3 ) 

We shall refer to T n and Tt^ n as the complete trees. Returning to biological 
terminology, a complete tree records the birth times and every (extinct or 
extant) species in a clade and the extinction time of extinct species. Every 
realization of a complete tree also uniquely determines a realization of a 
lineage tree of the extant species. This is the smallest subtree of the complete 
tree that contains all the divergence times for pairs of lineages of extant 
species, without recording which ancestral species contain the lineage. We 
let At, n and A n denote the lineage trees of % iU and T n respectively. The 
time parameter s within A n decreases from the time T™ rca of most recent 
common ancestor of the n extant species, to the present time 0. (The lineage 
tree is what is usually called the phylogenetic tree, though logically all the 
trees under consideration are different kinds of phylogenetic tree.) 



3 Point process representations of lineage trees 
3.1 An exact description 

It is perhaps remarkable that there is a useful exact description of the lineage 
tree At,ni based on a certain point process representation illustrated in Figure 
1. Consider an arbitrary lineage tree on n species. Draw the tree as in Figure 
1; recursively from the top down, at each divergence point of lineages choose 
randomly which branch is drawn on the left and which on the right. After 
drawing the tree, label species as 1,2, ...,n in left-to-right order. Each 
divergence of lineages involves adjacent contiguous blocks of species, say 
{i, i + 1, . . . ,j} and {j + 1, j + 2, . . . , k}, and occurs at some time s. We 
mark the occurrence of this divergence by a mark x at coordinates {j + \, s) 
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and then draw the combined lineage as a vertical line upwards from the 
mark. 



1 



6- 
time 

4- 



2- 



x 



2 T 6 8 10 
Figure 1. The point process representation of a lineage tree on n = 10 species. 

The advantage of this precise way of drawing the tree is that one can clearly 
reconstruct the tree from the coordinates {(i + r>, Si), 1 < i < n — 1} of the 
marks. So the distribution of the point process of marks serves to specify 
the distribution of the lineage tree. 

Proposition 1 (|20j Lemma 3). Fix n > 2 and t > 0. The point process 
{(i + 2, hi), 1 < i < n — 1} where the (hi) are i.i.d. with density function 



f t (s) = {l + t- 1 ){l + s)- 2 , 0<s<t 
represents the lineage tree At- n within the complete tree %, fn . 



(4) 



The derivation of this result will be explained in Section |SJ where the 
underlying contour process is exploited further. 

We are mostly concerned with the lineage tree A n , which by © has a 
mixture representation 



POO 

P(A n G •) = / P(A,n 6 -)q n (t) dt 

Jo 



(5) 



where q n (t) is the density function (jHJ) of T° r . One can get exact formulas 
for various attributes of A n . Consider for instance the number of lineages 
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at time s. Because each divergence time creates one extra lineage, it is clear 
that within A.t n this number of lineages is distributed as 

1 + Binomial(n — 1, F t (s)) 

for 

F t (s) = I f t (u) du 



1 t 



t(l + s) 

Thus within A n the distribution is the mixture of Binomials implied by (jSJ) . 
Similarly, the exact distribution of the time T™ rca of most recent common 
ancestor is 

/•oo 

p (r mrca < u) = (l _ F t {u)) n ~ l q n (t) dt. 

JO 

In this paper we focus on n — > oo asymptotics, which may give more con- 
ceptual insight than do complicated exact formulas. As we see below, it is 
useful to distinguish two kinds of asymptotics: global limits refer to times of 
order n, whereas local limits refer to times of order 1. 

3.2 The global limit point process 

From the formula © for q n (t) we calculate: if t n /n — > t > then 
nqnitn) = > 2 I - — — — > t e t. 

(i + t n y v i + W n ^°° 

The limit is the density function of the Inverse Exponential IE(l) distribu- 
tion, that is to say of l/£ where ^ has Exponential(l) distribution. So we 
have shown 

Lemma 2. 

n -l r or yor^ sa?/j 

where the limit T OT has IE( I ) distribution. 

Now reconsider Figure l. To obtain a global limit we want to rescale time 
by a factor n and we want to rescale the left-to-right positions of marks to 
fit into a unit interval [0, 1], implying they also must be rescaled by a factor 
n. Thus the original point process of marks + < i < n — 1} is 

rescaled to {(— S 5i ),l<'£<w — 1}. In the setting of Proposition ^ the 
relevant calculation is: 

if s n /n — > s > and i n /rt — ► i > then n 2 f tn (s n ) — > s -2 

and the following limit behavior is intuitively clear. 
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Corollary 3 (|20j Lemma 4, Theorem 5). Let t n j 



n 



t > 0. The 



rescaled point process {{—^-, -*), 1 < i < n — 1} associated with the lineage 
tree At n ^ n converges in distribution to the Poisson point process (nit> sa v) 
whose intensity measure is v[dl x ds) = dls~ 2 ds l[o,i]x(o,t)- 

The limit nit, illustrated in Figure 2, has an infinite number of points 
close to the lower boundary, but weak convergence on the open interval (0, t) 
means convergence over regions away from this boundary. Figure 2 indicates 
visually how the Poisson point process limit defines a limit random tree 
which is a kind of "continuum tree" with a lineage for each real I 6 (0, 1), 
though we do not seek to formalize this idea. 
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Figure 2. The point process ttxj on the right represents the lineage tree of a 
continuum of species on the left. 

The mixture representation (JSJ) and Corollary |31 immediately imply a global 
limit theorem for A n . To state it, let T or have IE(1) distribution. Define a 
Cox point process (iri, say) on (0, 1) x (0, oo) as follows. Given T or = t, let 
7Ti be a Poisson point process with the law of 7ri^. 

Corollary 4. The rescaled point process < i < n — 1} asso- 

ciated with the lineage tree A n , considered jointly with T° r , converges in 
distribution to the Cox point process tt\, considered jointly with T OT . 

Here is a quick application of this global limit theorem. 
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Corollary 5. The limit joint behavior ofT° r and T™ rca is given by 

(rC 1 ^ ,rr l T™ c& ) S (r or ,T mrca ) 
where the limit law has joint density 

/yor^mrca (t, s) = t~ 2 S~ 2 e~s , < S < t. 

The marginal density J7" mrca { s 

/ T mrca(s) = S & » , S > 0. 

The limit joint distribution can alternatively be expressed as (T or ,T m 
gi+g 2 ) where 5,1^2 are i.i.d. Exponential{\) . 



Proof. Corollary^implies convergence in distribution to the limit (T or , T mYca ] 
in which T mrca is defined as the maximum height (that is, maximum second 
coordinate) of any point of tv\. Given that T or = t, 7ri is distributed as a Pois- 
son point process irij with intensity measure v(dlxds) = dl s~ 2 ds l[o,i]x(o,t)- 
Consequently, for the conditional law of T mrca given T or = t we have 

P(T mrca < s|T or = t) = P({7ri it n[0,l] x (s,t)} = 0) 

= exp ^— J u~ 2 du 
= e^~», < s < t. 

So 

P(T mrca < s, T or e dt) = e^P(T or € (it) = t~ 2 e"s dt, 0<s<t 

implying the formula for joint density. The remaining calculations are 
straightforward. □ 



3.3 The local limit point process 

There is a different limit regime in which time is not rescaled. This tells us 
the local structure of the lineage tree relative to a given typical species, where 
"local" refers to lineages merging with the given lineage within bounded 
time. The relevant calculation is that, in the setting of Proposition ^ if 
t n — > oo then 

ftn(s)^ f(s) :=(! + s)~ 2 , 0<s<oo. 
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Consider the point process on (Z + 5) x (0, 00) consisting of points {(i + 
\,r]i), i G Z} for i.i.d. (7^) with density f(s) = (1 + s)~ 2 . As illustrated 
in Figure 3, the point process defines an infinite tree, Aoo say, on lineages 
labeled by Z. 
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Figure 3. A realization of part of Aoo, approximating the local structure of 
A n for large n. The 2 visible ancestral lineages diverged at around time 16. 

Propositionnarid the calculation above easily imply the first assertion below; 
the second assertion follows from the mixture representation ©, where in 
this setting the mixing makes no difference. 

Corollary 6. Let t n — > 00 and let U n be uniform on {1,2, ... ,n} inde- 
pendent of Atn,n- Write {(U n + i + \, su n +i), i € Z} for the point process 
associated with the lineage tree At n n > centered at lineage U n , where Sj = 
for j outside [l,n]. Then as n — > 00 this point process converges in distribu- 
tion to the point process {(i + ^,77^), i £ Z} defining Aoo- The same result 
holds for A n . 

Less formally, the structure of Ax, around lineage provides an asymp- 
totic approximation to the structure of A n around a random lineage. 
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3.4 Some local calculations 

We record some elementary calculations within Aoo, reflecting aspects of the 
n — ► oo behavior of the lineage trees A n . For a lineage at time s we call the 
present (time - 0) number of species descending from this lineage the size of 
this lineage. The n — ► oo limit of n~ l (number of lineages in A n at time s) 
is what we will call the density of lineages in at time s. 

Corollary 7. [Some calculations for Aoo-] 

(a) The density of ancestral lineages at time s in the past equals (1 + s) _1 , 
and the size of a random lineage at time s has Geometrical + s) _1 ) distri- 
bution; 

(b) the rate of lineages merging as s increases (time runs backwards) is 
m(s) = 2(1 + s) _1 ; and given that this event occurs at s for some lineage 
then the size of the lineage it merges with has Geometrical + s)" 1 ) distri- 
bution; 

(c) as s decreases (time runs forward) the rate at which a lineage of size 
k > 1 branches is 6fc(s) = (k — l)(s(l + at time s, and the size of the 
lineage produced on the left of the branchpoint has Uniform distribution on 
{l,...,fc-l}. 

Proof, (a) The density of ancestral lineages at time s in the past is just the 
density of branching points at times greater than s 

f{u)du = (1 + s)- L . 

Hence, the number of extant species descended from a "typical" lineage at 
time s has Geometrical + s)^ 1 ) distribution 

since this is the distribution of distances between branchpoints at heights 
greater than s. 

(b) As s increases (time runs backwards) the probability of a lineage 
merging with another lineage is 

m(s) = 2^ = £ 

because such a merger occurs in [s, s-\-ds] when one of the two branchpoints 
separating the given lineage from its neighboring lineages, which must be at 
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height > s, occurs during [s, s+ds], and this has chance f(s)ds/G(s) for each 
branchpoint. Moreover, if a lineage merges at s then (independent of the size 
of the first lineage) the size of the second lineage has Geometrical + s) _1 ) 
distribution above. 

(c) As s decreases (time runs forwards) , the unconditional rate of mergers 
of clades of sizes k\, &2 at time t (per unit time, relative to number of species) 
equals 

G{s)(l - G(s)) kl ^ 1 f(s)(l - G(s)) k2 ~ 1 G(s) 

which we observe by considering the required heights of branchpoints for 
this event to occur. Similarly the number of size k\ + fcg lineages at time t, 
relative to number of species, equals 

G(s)(l-G(s)) fel+fe2 - 1 G(s). 

Thus the rate of splitting of a size k\ + ki lineage into two lineages of sizes 
k\ , &2 equals 

G(s)(l - G( g ))^" 1 /( S )(1 - Gjs^Gjs) 1 

G(s)(l-G(s)) fe i+ fc 2- 1 G(s) s(l + a)' 

Thus, if a lineage is of size k then at time s the stochastic rate of branching 
is 

Since the rate of splitting is independent of the choice of partition of k into k\ 
and &2 the size of a left subclade lineage is Uniform on {1, 2, . . . , k — 1}. □ 



4 Time reversal and consequences 

Recall that for a stationary Markov process, its time-reversal is also a sta- 
tionary Markov process. For a Markov process which is not stationary, 
or which is conditioned on a terminal value, the time-reversal is typically 
non-homogeneous. So Lemma |H1 below highlights a special feature of our 
processes. 

In the critical branching process underlying our model (Section |2J), the 
population size is the continuous-time Markov chain with transition rates 

= = i- (6) 

Recall the definition of the complete tree T n . Write (N n (s),T° v > s > 0) for 
the associated process which counts the number of species at time s before 
present. The next lemma makes precise a sense in which the process (N n (s)) 
is a time-reversal of the chain Q started at 0. 
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Lemma 8. Let (N n (s),0 < s < T®) be the continuous-time chain JSJj with 
N n (0) = n, run until the first hitting time T® on state 0. Then 

(N n (s),T™>s>0) I (iV„( S ),0< S <T n °). 

Proof. We verify that (iV n (s),0 < s < T°) is the time-reverse of the popu- 
lation size process by checking probabilities of primitive events (see Section 
15.41 for more sophisticated views). Fix sm > sm-i > . . . > si > so = 
and positive integers 1 = ku, &M-l> ■ • • = n with \k m — fc m -i| = 1. Set 
= 0. The event 

as s decreases, N n (s) jumps from k m+ \ to k m during [s m ,s m + 
ds m ] (VM > m > 1) and makes no other jumps 

has measure 

2 

x ] [ [e~ km ^ Sm ~ Sm -^ k m ds m -i^J x e _fclSl 

m=M 

where the first term oIsm comes from the uniform Bayes prior. For the 
reversed process, the event 

as s increases, N n {s) jumps from k m to k m+ i during [s m ,s m + 
ds m ] (VI < m < M) and makes no other jumps 

has probability 

M 

J] (e-^( s -- s — k m ds m ) . 

m=l 

By inspection, the first measure is exactly 1 /n times the second probability, 
so after conditioning the probability measures are equal. □ 

We now observe two simple consequences of this time-reversal identity. 
The process (N n (s),0 < s < T®) is a skip-free martingale started at n and 
run until hitting 0, so by the hitting time formula for martingales 

P ( max NJs) > c ) = 2, c > n. 
\0<s<T° J 

So Lemma |S1 implies 
Corollary 9. 

P ( max NJs) > c ) = 2 c > n. 
\T™>s>0 J 
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Second, every extinction within the process T n corresponds to a down- 
wards step in N n {s) as s decreases and hence to an upwards step in N n (s) 
as s increases. The number of such upward steps equals {D n — n)/2, where 
D n is the number of steps of the embedded jump chain of N n (-), which is 
just discrete-time simple symmetric random walk. 

Corollary 10. Within the model T n of a clade on n extant species, the total 
number N° xt of extinct species is distributed as (D n — n)/2, where D n is 
the hitting time to for simple symmetric random walk started at n. In 
particular 

-2 n d l 

where T\ is the first passage time of standard Brownian motion from 1 to 0, 
with density function 

f Tl {x) = (27rx 3 )~^ e~2^ ; < x < oo. 

The second assertion follows, of course, from weak convergence of simple 
random walk to Brownian motion. 

5 Exploiting the contour process 

The results so far answer some, but not all, questions one might ask about 
the complete tree T n and the lineage tree A n in our model. For instance, 
the time-reversed process (N n (s)) in Lemma|H]has a n — > oo rescaled limit, 
the well-known Feller branching diffusion, which therefore is the limit of 
the population size process (N n (s),T° r > s > 0). But this doesn't tell 
us anything about the relationship between (N n (s)) and the lineage tree 
An- For instance, a conceptually interesting question in the species context 
concerns A^ n (T™ rca ), the total number of species in the clade alive at the 
time of most recent common ancestor of the extant species. Recall also that 
the results of Section |31 were all based on the exact formula in Proposition 
but we have not yet given any indication of its proof. It turns out that 
both these matters, and the local limit structure of the complete tree, can 
be studied using the contour process, described next. 

5.1 The contour process 

For any deterministic population process in continuous time, starting at the 
birth of one single individual, in which individuals have birth times, death 
times, and may give birth to children at distinct times, there is a particular 
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representation as a rooted planar tree which we now describe. Each individ- 
ual is represented by an edge whose length equals that individual's lifetime. 
The birth of an offspring corresponds to a branchpoint from its parent's 
edge, and the length of the parent's edge up to this branchpoint equals the 
age of the parent at this offspring's birth time. From the branchpoint the 
offspring's edge is drawn to the right of the parent's edge. If the total pop- 
ulation is finite, then we can label the individuals in a "depth-first" search 
order. This is illustrated in Figure 4 where tree edges have been drawn as 
full vertical lines and the branchpoints have been indicated by horizontal 
dotted lines. 



12 13 14 




C(u) 



Figure 4. A realization of a tree % tn with n = 6 extant individuals (labeled 
{2, 4, 5, 12, 13, 14}) and its contour process representation C(u). 

Associated with such a rooted planar tree is its contour process defined as 
follows (these ideas go back to Neveu and Pitman |17j and their broader 
significance can be seen in the lecture notes of Pitman JH]). The contour 
process C(u) is a continuous function giving the distance from the root at 
time u in a unit-speed depth-first walk around the tree. Such a walk starts at 
the root, traverses each edge completely once upwards and once downwards, 
following the depth-first order (intuitively: clockwise around the edges of the 
tree) ending back at the root. So the contour process consists of alternating 
line segments of slopes +1 ("rises") and slopes —1 ("falls"). The unit speed 
convention implies that heights in the contour process match the times in 
the population process (birth and death times are matched respectively by 
the local minima and local maxima in the contour process). 
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5.2 Contour processes associated with random trees 

Recall that T denotes the continuous-time critical branching process started 
at time with one individual and continued until extinction. The relevance 
of contour processes is indicated in the next result of Neveu-Pitman-Le Gall 

mini. 

Proposition 11. In the contour process of T the sequence of rises and 
falls —(,2, (,3, — • • • ) £m-i)j excluding the last fall, has the distribution 
derived from a sequence (£i)i>i of independent Exponential(l) variables, for 
M := min{m : & - £ 2 + £s - £4 + . . . - £ m < 0}. 

Call this contour process (£1, —£2, • • • , £m-i) an ERW excursion, for Ex- 
ponential random walk. Accordingly call the infinite sequence (£1, —^2, ^3, — ^4, • • •) 
an ERW process. Here is a classical result 

Lemma 12. Let H be the maximum height in an ERW excursion, or equiv- 
alently (by Proposition ^ 1\) the extinction time ofT. Then 

P(H > h) = + < /i< 00. 

Proof. This follows directly from the law of the population size process of 
T given in The extinction time of T is greater than h if and only if 
the population size of T at time h is strictly greater than 0. By Q the 
probability of this is 1 - h(l + h)~ l = (1 + h) T 1 . □ 

Before proceeding to new results let us indicate the proof [20] of Propo- 
sition^ because our arguments in subsequent sections will use similar ideas. 
Fix t and n. Condition the contour process C(-) to have exactly n upcross- 
ings over height t; see Figure 4. This gives the contour process of the random 
tree (Xtn-> sa y) w hich is the CBP conditioned on having exactly n individ- 
uals alive at time t. This is the same as our model Tt,n except for the 
"direction of time parameter" convention, and except for the fact that in Tt >n 
the process terminates with the n individuals at the present time, whereas 
in 7^+ the process of descendants of these n individuals continues until ex- 
tinction. But the latter difference plays no role in the following argument. 
The heights of the minima between each pair of successive upcrossings in 
Figure 4 match the divergence of lineages of that pair of extant individuals. 
Marking these heights at regular horizontal interval spacings gives exactly 
the point process At, n as in Figure 1 except for reflecting the vertical time 
scale. Since C{ ) is strong Markov and stationary the parts of an ERW excur- 
sion between a downcrossing of t and the next upcrossing of t are mutually 
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independent, and moreover are distributed exactly as the reflection of the 
original ERW excursion that is conditioned not to have height greater than 
t. Thus these heights of lineage divergence, when measured on the reflected 
time scale (i.e. downwards from t), are distributed as the maximum height 
H in Lemma IT21 conditioned on {H < t}. This conditioned distribution is 
the distribution (j3J), as required for Proposition^ 

5.3 Species numbers at time of most recent common ances- 
tor and weak convergence of the contour process 

Recall that iV n (T™ rca ) stands for the number of species alive at the time of 
the most recent common ancestor. In the contour process the number of 
species at any time s after its origin is the number of up-crossings (which 
equals the number of down-crossings) in the contour process at height s. If 
the time since the origin of T n is T° r = t then the contour process has n 
up- and down-crossings at height t. If the time of the most recent common 
ancestor is T™ rca = s then the maximal depth of the subexcursions below 
height t, measured away from t, is s; see Figure 5. In other words, the 
lineage divergence of the most recent common ancestor is the lowest local 
minimum between the first and last upcrossing of t and occurs at height 
t - s. 




Figure 5. Parts of the contour process between [^_ s ,« t ] and [dt,d t - s ] 
describe the number of species alive at the time of the most recent common 
ancestor. 

In the contour process mark by u s the horizontal coordinate of the first 
upcrossing of a height s and d s the coordinate of the last downcrossing of 
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this height. There are no up- or down-crossings of t — s before the first 
upcrossing of t — s and after the last downcrossing of t — s. And if t — s is the 
height of the T mrca ) as in Figure 5, then there are no up- or downcrossings 
of t — s between the first upcrossing of t and the last downcrossing of t. 
So, iV n (T™ rca ) is the number of upcrossings of t — s between u t - s and u t , 
plus the number of downcrossings between dt and dt- s . Since the contour 
process is an ERW excursion that is conditioned to have n upcrossings and 
downcrossings at height T° r , we can now calculate 

Lemma 13. Conditional on (T° T ,T™ ca ) = (t,s), N n (T™ Tca ) is distributed 
as a sum of two independent Geometric (p n ) random variables, where 

t — s s 

Pn = l 



1 + t-S 1 + S 

Proof. Since the contour process C(-) is strong Markov and stationary, the 
part of the process between the first upcrossings ut- s of t — s and ut of t 
when considered from height t — s upwards: C (it) — (t — s), Ut~ s < u < Ut, is 
distributed as an ERW process conditioned to reach height s before it reaches 
a depth — (t — s) and stopped when it first hits s. Since C(-) has the same 
law when its u coordinate is run in reverse, the part of the contour process 
between the last downcrossings dt of t — s and dt- s of t when run backwards 
in the u coordinate: C(u) — (t — s), dt~ s > u > dt, is also distributed 
as a ERW process conditioned to reach height s before it reaches a depth 
— (t — s) and stopped when it first hits s. Additionally these two parts of 
the contour process are independent. 

The probability an ERW process reaches s before it reaches —(t — s), by 
the law of maximum height H in Lemma IT2*| is 

P(H > s) 1+t-s 



P(H > s) + P(H > t - s) - P(H > s)P(H > t - s) 1 + t 

The probability an ERW process makes k upcrossings of until it first hits 
s , provided its height stays below s and its depth above —(t — s), is for 
A; = 1,2,... 



(P(H < t - s)P(H < P(H > s) 



, \ fc-i -, 

t—s s \ 1 



1 + t-sl + s J l + s 



So the number of upcrossings of t — s C(u) makes during [itt- s , Ut] has a 
Geometric! 1 — ) distribution. □ 
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Since by Corollary El (n _1 T° r , n" 1 !^) 4 (T or ,T mrca ), asn^oo 

T^or T-imrca o-rnrca i i 

npn = l__i^ 1" £l5 ± i + _L_ 

J ^ _|_ J^or 2~' mrca 1 -)- [fnirca yor T^mrca T^mrca 

and the above two Geometric (p n ) variables, when rescaled by n" 1 , con- 
verge to independent Exponential(A(T or , T mrca )) variables, where X(t, s) = 
(t — s) _1 + s" 1 . Consequently, the conditional law of n N n (Tjf aca ') given 
(T° r ,T™ rca ) converges to a Gamma variable with shape parameter 2 and 
scale parameter A(T or , T mrca ). Combining this with the result of Corollary 
03 we have established assertion (J7J) below. 

Corollary 14. The joint limit behavior of the triple T° r , T™ rca , iV n (T mrca ) 
is given by 

(n^TZ^n^T^^^NniT™™)) S (T or , T mrca , iV mrca ) 
where the limit has the joint density 

/ T or T m r ca j7V m rC a(t,S,r) = t ~ 2 S ~ 2 X (t , s) 2 T e~ * ~ A (*' S ) r 

= (t - s)~ 2 s" 4 re" = , < s < t, < r(7) 

The marginal density of _/V mrca is 

/ w -c a (r) = 2(1 + r) -3 , r > 0. 

The marginal density formula follows from (JJJ) via a calculus exercise. Note 
that while the distribution of jV mrca has mean 1 it has infinite variance. 

Remark. The contour process of % n)n (illustrated in Figure 5), in the 
limit t n /n -> f 6 (0, oo), converges after rescaling to a Brownian excursion, 
conditioned on total local time at height t being equal to 1. Results like 
Corollarv ll4l mav be reinterpreted as providing exact formulas for quantities 
defined in terms of such conditioned Brownian excursions. 

5.4 Extinct species 

Textbooks (e.g. ^H] page 24) often say 

the probability that a given fossil is actually part of an ancestral 
lineage [of some extant species] is actually rather remote. 
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Various calculations relevant to this issue can be done within our model. 

Consider some species v that originated at time h before the present. In 
the limit as n — > oo, the distribution of the clade of species descending from 
v is given by the local limit structure of the complete tree . As stated in 
Section 1531 in the limit, the descendants of a species v evolve, as time runs 
forwards, as in an ordinary critical branching process T. Then, the chance 
that some descendant of v (or v itself) is extant at present equals the chance 
of the survival of its descendant tree T for time h or longer. By Lemma IT2*1 
this is precisely (1 + ft) - , so we have 

Corollary 15. For any species alive at time h before the present, the chance 
that some of its descendant species (or the species itself) is extant is, in the 
limit n — > oo, 

1/(1 + ft)- 

Now consider the total number N^ nc of species that are ancestral to the 
n extant ones. (Precisely, we exclude the extant species, and go back to the 
time of origin of the clade). Intuitively, because (Lemma [SJ the number of 
species at time h is N n (h) ~ n for h = o(n), and because (Corollary [5J) the 
time of origin T° r is of order 0(n), we expect from Corollary 1151 that 

fO(n) 

HK nC ] ~ / TTK dh^nlogn. 
Jo 

We shall prove a precise result as Corollarv ll71 based on the following lemma. 

Lemma 16. Conditional on T° r = t, the total number of ancestral individ- 
uals N^ nc in T ri satisfies 

n 

i=i 

where Xi,l < i < n are independent, X\ has Poissonft) distribution and 
X2, ■ ■ ■ ,X n have the law, with ft(-) as in Q), 

p(Xi = k ) = f a ^jrf^ ds > k ^°- 

Proof. Label the extant individuals {1,2, ... ,n} from left to right as they 
appear in the contour process. Let Xj be the number of ancestors of the ith 
extant individual, without including any of the ones previously counted in 
Xj,j < i. 

Suppose T or = t, then the ancestry of the extant individuals is described 
by the part of the contour process C(-) below height t. Recall that the part 
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of C(-) below t consists of: n — 1 independent sub-excursions below t, which 
we label < i < n — 1, and the part of C(-) before the first up-crossing 
and after the last down-crossing of t; we label the former part as eo,_R- See 
Figure 6. 

Let hi,l < i < n — lbe the depths of the sub-excursions ej, so that 
t — hi are the heights of the lowest points of ej. These match the times 
of divergence of lineages of extant individuals. Their law was given by (j3J) 
of Proposition ^ Now, partition the excursions at their lowest points 
and let e^, 1 < i < n — 1 denote the parts on the right. Then Figure 
6 shows that the ancestors of the 1st extant individual correspond in the 
piece of the contour process eo,_R to the levels of constancy of the process 
%,r(u) = inf v > u (eo,fl(u)). These levels of constancy of ?o,R match the times 
of lineage divergence of the ancestors of individual 1. Similarly for the ith, 
2 < i < n, extant individual Figure 6 shows that the additional ancestors 
of individual % (excluding those appearing as ancestors of extant individuals 
j < i) correspond in the piece of the contour process ei-\^R to the levels of 
constancy of the process q_i 5 #(u) = m( v > u (ei-i t n(v)) . 



4 5 6 

5 




Figure 6. Ancestral lineages of the extant individuals (labeled 
{1,2,3,4,5,6}) are matched in the contour process by the levels of con- 
stancy of the processes ft-i,^, for 1 < i < n. 

So the number of ancestors Xi of the ith extant individual is the number 
of levels of constancy of the process ^_i 5 r(-). It is clear that the piece 
eo,_R is distributed as an ERW process conditioned to hit t before 0, and 
stopped the first time it hits t. It is less obvious but none the less true (see 
Lemma 6 of (20]) that, given hi, the piece e^^ is also distributed as an ERW 
process conditioned to reach hi before 0, and stopped the first time it hits hi. 
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For such a conditioned ERW, the levels of constancy of its future infimum 
process form a Poisson process, restricted to, respectively, the set [0, t] for 
eo,_R, and [0, hi] for ej-i^, 2 < i < n. This can easily be seen for levels of 
constancy of the past supremum process of a conditioned ERW, (Lemma 6 
of |2Uj ) . then time reversibility of an ERW excursion implies the rest. So 
the number of ancestors of the 1st extant individual is Poisson(t), and the 
number of additional ancestors of the extant individuals i, 2 < i < n, is 
Poisson (hi). Combining this with the distributions of the depths hi, given 
by © in Proposition ^ we have proved the claim. □ 

For the limit of the number of ancestors A^ nc we have 

Corollary 17. As n ^ oo — A 1. 

J n log n 

Proof. Fix (t n ) such that t n /n — > t € (0, oo). Because (Corollary 0) n _1 T° r 
has a distributional limit on (0, oo), it suffices to prove the following: con- 
ditional on {T° r = t n } we have -> 1. 

We shall prove this using the representation N^ nc = Y17=i ^% from 
Lemma where in the following argument we are always conditioning 
on {T° r = t n }. Note that the contribution to the sum from X\ is negligible 
(because X\ has Poisson(t n ) distribution), so we may assume X\ has the 
same distribution as the Xi, 2 < i < n. We now calculate 

E[X 2 ]= sf tn (s)ds~ s(l + s)- 2 ds~logt n -logn 
Jo Jo 

and a similar calculation shows 

var[X 2 ] =0(n). 

Thus 

£[A^ nc ] ~ n log n; var[A^ nc ] = 0(n 2 ) 
and the desired result - g- - — > 1 follows via Chebyshev's inequality. □ 

5.5 Local limit structure of the complete tree 

The contour process makes it conceptually easy to see a result, complement- 
ing Corollary for the lineage tree A n , concerning the local limit behavior 
of the complete tree T n . It turns out that the local structure relative to 
a given typical individual in T n , converges to the local structure relative 
to the root in an infinite tree that can be easily defined from a CBP tree. 
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There are two versions of such results, depending on whether for the typical 
individual we choose a random extant species or a random species from the 
entire history of the clade (which will be extinct, with probability -> 1 as 
n — ► oo). The details in obtaining the results are rather fussy, so we only 
outline the proofs. 

Let i be an individual in the complete tree T n , with birth time b(i) say. 
Within this section our convention for the time parameter in T n is that it 
increases as time increases. For a > let T n (i, [b(i) — a, b(i) + a]) denote the 
subtree of T n comprised of all the individuals j whose birth time is in the 
time interval [b(i) — a, b(i) + a] and for whom the divergence time of their 
lineage from that of i is in the time interval [b(i) — cr,b(i) + a]. See Figure 
7. Call i the distinguished individual in T n (i, [b(i) — a, b(i) + a]). 



time 



b(i) + a 



■ 6(i) 



■ 6(i) - a 



Figure 7. The local structure of the complete tree, relative to individual i. 

We now describe an infinite random tree T derived from the CBP. Take a 
distinguished individual, born at time 0, and let the tree of it and its descen- 
dants be distributed as the CBP tree T. Let the parent of this individual 
have Exponential(l) age at time and have an independent Exponential(l) 
lifetime after time 0. Inductively, let the grandparent have Exponential(l) 
age at the birth of the parent, and have an independent Exponential(l) 
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lifetime after that birth time; and so on. For each of these ancestors, let 
them have other children during their lifetimes at the times of a Poisson 
(rate 1) process and let the trees of such children and their descendants be 
distributed as independent CBP trees. This completes the description of T. 

Recall the construction (Proposition of CBP tree T from the ERW 
excursion (£1, — ^2,^3, • • •)• K ^ s eas y to check that given a two-sided ERW 
process (. . . , — £_2> €-1, — £oi £1, — £2, £3, • • •) an analogous construction pro- 
duces the infinite tree T. Write T[—a, a] for the subtree of T comprised of 
individuals j whose birth time is in the time interval [— a, a] and for whom 
the divergence time of the lineages of j and of the distinguished individual 
is in the time interval [—a, a}. Note that 

f[-a, a] is determined by M~ <i< M + ) (8) 

where (£ M - , -£ M - +1 , . . . , -£ , £1, • • • , €m+-i> ~€m+) is the excursion of the 
two-sided ERW process above height —a. 

Here is the result for the convergence of the local structure of T n as seen 
relative to a random (extinct) individual, to that of the local structure of 
f. 

Proposition 18. Let I n denote a uniform random species from the clade 
T n . Then as n — > 00 for fixed a > 0, 

f n (I n ,[b(I n )-a,b(i) + a}) A f[-a,a}. 

Remark. The underlying notion of convergence of finite trees is the 
natural one, which can be formalized in several equivalent ways, e.g. via a 
point process representation. 

Proof. We outline the proof, omitting details. Write > 1) for the ERW 
process. Fix an integer m > 2. Let 0m,N be the empirical distribution 
of the N 2m-vectors 6i+2, • • ■ ,6i+2m); < i < N - 1}. So 6» m ,7v 

is a random probability distribution, which (using the Glivenko-Cantelli 
Theorem on R 2m ) converges in probability, as N — > 00, to the non-random 
probability distribution fi m = dist(£i, . . . ,^m)- By large deviation theory 
(see [Sj §6.3) this convergence remains true conditional on events A n for 
which 1/P(A N ) = 0(/3 N ) for all > 1. 

To prove the proposition, recall (Lemma EJ) that T° r is order n. So we 
can fix (t n ) such that t n /n — > t € (0, 00) and it is sufficient to prove the 
Proposition for T tn ^n- Fix also integers N n such that N n /n 2 — > v € (0, 00). 
Let A]y n be the event that an ERW process has an excursion above with 
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exactly N n rises and falls, and that this excursion has exactly n upcrossings 
over height t n . (Then n 2 is precisely the right scaling for the number of 
rises and falls of an excursion with n upcrossings of a level t n of order n.) 
Conditioned on this event, the ERW excursion is the contour process of a 
random tree T, + „ M which is the tree „ continued until extinction that 
is conditioned to have the total number of individuals equal to N n . Let us 
first prove the proposition for T t + N . 

One can show that the probability P(A^ n ) decreases not faster than 
polynomially in 1/N n , so by our "large deviation" result earlier, the empir- 
ical distribution 9 m ^„ of 2m-tuples conditioned on A^ n converges to /j, m . 
This implies the weaker result that, for J n uniform on {2, 4, 6, ... , 2N n }, 

(On-m+li ■ ■ ■ iO„! ■ ■ ■ iO„+m) ► (9) 

where the left side is conditioned on A/v n - But this says that, relative to a 
uniform random individual I n in % + n N , any aspect of the "local structure" 
of the tree which is determined by the contour process segment of length 2m 
centered on that individual will converge in distribution to the same aspect 
of the local structure of T. By taking m large and appealing to (JHJ), we see 
that the proposition holds for T t + n N . 

To complete the proof it is enough to show that the proposition holds for 
the stopped tree % nt n,N„- Unfortunately this does not follow directly from 
the unstopped case, because a non-negligible fraction of all individuals in 
T, + AT will be descendants of the n individuals alive at time t n after origin. 
Instead, fix small < 5\ < 62 and consider the segments of the contour 
process C + of T t n N defined by: 

si is the segment of C + until its first upcrossing of (1 — 5i)t n , 

52 is the segment of C + from the subsequent downcrossing of (1 — 82)t n 
until the next upcrossing of (1 — 5\)t n , 

53 is the segment of C + from the subsequent downcrossing of (1 — <52)t n 
until the next upcrossing of (1 — Si)t n , 

sn is the final segment of C + after the final downcrossing of t n . 
Conditional on the event A^ n , there is some conditional distribution of start- 
ing and ending positions for each segment. Given all these positions, each 
segment is distributed as an ERW process conditioned on having the first 
upcrossing of a certain level after a prescribed number of steps. The number 
of these segments is stochastically bounded as n — > 00, so the probability of 
the conditioning event for each segment is still only polynomially small in 
1 /length of the segment. Thus separately on each segment we can show as 
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above that the contour process satisfies © for J n uniform on that segment. 
Since these segments comprise (in the n — > oo limit) a proportion 1— e(5\, 62) 
of the entire contour process of T tnt n t N n , where e — ► as 5%, 82 — > 0, we can 
deduce the proposition for the stopped process % n)ril N n - □ 

We now state (omitting the similar argument) the parallel local limit 
result for T n as seen from a random extant individual. In this setting the 
relevant limit infinite tree, which we again call T, is a variation of the T 
above described as follows. The distinguished individual has Exponential(l) 
age at time 0. Its ancestors and their descendants are all as described before, 
except that now the infinite tree T is stopped at time 0. 

Proposition 19. Let I n denote a uniform random extant species from the 
clade T ri . Then as n — > 00 for fixed a > 0, 

f n (I n ,[-a,0}) 4 f[-a,0}. 

One can now make exact calculations of probabilities for the distin- 
guished individual in T, which represent the n — > 00 limit results for a 
random extant individual in T n . Here is a simple example of possible calcu- 
lations within T. 

Corollary 20. For the distinguished individual in T : 

(a) the probability that its parent is extant equals 1/2; 

(b) the probability that some ancestor of it is extant equals 1 — e _1 . 

Proof, (a) The probability that the parent of the distinguished individual is 
alive at time is simply P(£i < £2), where £1 is the age of the distinguished 
individual, and £2 is the subsequent lifetime of its parent after the birth. 
Because £1 and £2 are independent exponential(l) random times, we have 
P(£i < 6) = 1/2, by symmetry. 

(b) To calculate the probability that no ancestor of the distinguished in- 
dividual is still alive, one only need to note that the times at which some an- 
cestor originates form a Poisson process of rate 1 , and an ancestor originating 
at time s before present has chance e~ s to be extant, so the random number 
of extant ancestors has Poisson distribution with mean | °° e~ s x 1 ds = 1, 
and thus takes value with probability e _1 . □ 

6 Final remarks 

1. Our model of T n and A n has considerable variability between realizations. 
This can be seen mathematically in our distributional formulas (Corollarv ll4l 
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in particular) and visually on our web site [Q. In one sense this variability is 
an artifact of the uniform prior on time of origin, but serves a useful purpose 
in emphasizing that radically different appearance of real-world trees might 
logically be just chance variation without biological significance. 

2. Wollenberg et al study via simulation a model similar to ours 
- critical branching conditioned on n extant species - but handle the issue 
of time of origin in a different way, by taking it as the deterministic time 
t n which is the maximum likelihood estimator of origin time. In a sense 
this is unrealistic in the opposite sense to that of the previous remark, by 
underestimating variability. Our model extends more naturally to higher- 
order taxa. 

3. Our model is qualitatively similar (in the sense of orders of magnitude) 
to the Moran model, for quantities which can be studied in the latter model. 
In fact the results involving local weak limits (sections 13.31 and 13.4(1 are 
exactly the same in our model as in a continuized Moran model, because 
our model converges (in the n — ► oo limit) to the continuized Moran model 
over time intervals (backwards from present) of length t = o(n). 

4. Neutral models like ours are unrealistic for large clades, by the fol- 
lowing reasoning. For an n-species clade, our model gives (Corollary |5J the 
time of origin of a clade as order n time units ago. The time unit is mean 
species lifetime, typically estimated as a few million years. Thus our model 
predicts the origin of a n-species clade as being at least n million years ago, 
which is known to be an overestimate for most clades of size n > 100. 

5. The local point process limit in Corollary El is a simple instance of 
a general notion of local weak convergence of graphical structures associ- 
ated with point processes on R rf or abstract spaces. See E] for more 
sophisticated examples. In particular, Proposition ^] fits the general set- 
ting of asymptotic fringe distributions which exist for many different models 
of random trees |2j. 

6. Mathematicians traditionally tend to regard pictures as mere visual 
aids to illustrate a logical argument. But the graphical representations we 
use in Figures 1 and 4 really comprise the essence of the mathematical 
argument, by relating our model of random trees to well-understood models 
of point processes or random walks. 
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